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Abstract 

This paper proposes a method for construction of approximate feasible primal solu- 
tions from dual ones for large-scale optimization problems possessing certain separability 
properties. Whereas infeasible primal estimates can typically be produced from (sub- 
)gradients of the dual function, it is often not easy to project them to the primal feasible 
set, since the projection itself has a complexity comparable to the complexity of the ini- 
tial problem. We propose an alternative efficient method to obtain feasibility and show 
that its properties influencing the convergence to the optimum are similar to the proper- 
ties of the Euclidean projection. We apply our method to the local polytope relaxation 
of inference problems for Markov Random Fields and demonstrate its superiority over 
existing methods. 



1 Introduction 

Convex relaxations of combinatorial problems appearing in computer vision, processing of 
medical data, or analysis of transport networks often contain millions of variables and hun- 
dreds of thousands of constraints. It is also quite common to employ their dual formulations 
to allow for more efficient optimization, which due to strong duality delivers also primal 
solutions. Indeed, approximate primal solutions can usually be reconstructed from (sub- 
)gradients of the dual objective. However, these are typically infeasible. Because of the 
problem size, only first order methods (based on the function and its (sub-)gradient eval- 
uation only) can be applied. Since feasibility is not guaranteed up to the optimum, it is 
hardly attainable for such methods because of their slow convergence. The classical trick — 
projection to the feasible set — can not be used efficiently because of the problem size. 

A striking example of such a situation, which we explore in the paper, is the reconstruction 
of feasible primal e stimates for local polytope relaxations of Markov random field (MRF) 



inference problems [Schlesingeii Il976t iWerneii 120071 : IWainwright and Jordanl 1200 



Motivation: Why Feasible Primal Estimates Are Needed. It is often the case for 
convex relaxations of combinatorial problems that not a relaxed solution, but an integer ap- 
proximation thereof is used in applications. Such integer primal solutions can be obtained 
from the dual ones due to the complementary slackness condition and using heuristic local 



search procedures Werner . 2007 : Kolmogorov , 2006 : Ravikumar et ai . 2010| . However, a 



sequence of feasible solution estimates of the relaxed problem converging to the optimum 
guarantees vanishing of the correspondin g duahty gap, and hence (i) determines a theoreti- 
cally sound stopping condition [Boyd and Vandenbc rghc. 2004:] ; (ii) provides a basis for the 
comparison of different optimization schemes for a given problem; (iii) allows for the construc- 
tion of adaptive optimization schemes dependi ng on the duality gap, f or example adaptive 



step-size selection in subgradicnt-based schemes [Komodakis et aL. . .2011;iKappes et al\.\2011 \ 



Savchvnskvv et al. 



or ad aptive smoothing selection procedures for non-smooth problems 
20121. Another exam ple is the tightening of relaxations with cutting-plane based approaches 



Sontag et all |200^ 



Contribution. We propose an efficient and well-scalable method for constructing feasible 
points from infeasible ones for a certain class of separable convex problems. The method 
guarantees convergence of the constructed feasible point sequence to the optimum of the 
problem if only this convergence holds for their infeasible counterparts. Wc theoretically and 
empirically show how this method works in a local polytopc relaxation framework for MRF 
inference problems. Wc formulate and prove our results in a general way, which allows to 
apply them to arbitrary convex optimization problems having a similar separable structure. 



Formulation of the Main Result. We start by stating the main result of the paper for 
a separable linear programming problem. The result has a special form, which appears in 
the MRF energy minimization problem. This example illustrates the idea of the method and 
avoids shading it with numerous technical details. We refer to Sections [2] and [3] for all proofs, 
special cases and generalizations. 

Let (•, •) denote an inner product of two vectors in a Euclidean space. Let M" denote 
the non- negative cone of the n-dimensional Euclidean space M". Let / ~ {1, . . . , N}, J = 
{!,..., M}, be sets of integer indexes and N'ij), j G J, be a collection of subsets of /. Let 
further x G be a collection of {x, eWJ^, i e I) and y G W^-^ denote {yj G M!p, J G J). 
Let Aij^ i G /, J G J and Bi, i G / be matrices of dimensions m x n and n x k for some 
k < n and let Ci G M'^. Consider the following separable linear programming problem in the 
standard form 

N M 

min V {a„ x^) + V (fej, yj) (1) 

+ , 1=1 j = l 

yes™-' 

Ai^yj =Xi, i e Af{j), j e J , 
BiXi = Ci, i G / . 

Let D be the feasible set of the problem ([T]) and the mapping V : R"^ x R™-^ ^ _D be 
defined such that V{x,y) = where 

x^, i G / are Euclidean projections of Xi to the sets {xi G M" : BiXi = q} ; (2) 
y'j — arg min {hj.yj) s.t. A,jyj x[, i G 7V(j) . (3) 

The main result of this paper states that from the convergence of {x* , y*) G M"^ x M™'^, 
t = 1, 2, . . . oo io the set of optimal solutions of (QJ) it follows that V{x* , y*) converges to the 
set of optimal solutions as well. 

Please note that 

• V{x^,y*) is always feasible due to its construction; 
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• contrary to the Euchdean projection onto the set D, to compute V{x^,y^) one has to 
solve many, but small quadratic and linear optimization problems ([2])-(l3]), assuming 
that n <^ I, m <^ J and N{J) <g; /. To this end such powerful, but not very well 
scalable tools as simplex or interior point methods can be used due to the small size of 
these problems. 

In Section [5] we additionally show how the convergence speed of 'P{x*,y^) depends on 
coefficients a; and 5;. 

Assuming that the set D corresponds to the local polytope, variables Xi and yi to unary 
and binary " max-marginals" and weights a.i and hj to unary and pairwise potentials respec- 
tively, this result allows for an efficient estimation of feasible primal points from infeasible 
ones for MRF energ y minim i zation algorithms, which has been considered as a non-trivial 
problem in the past Wernen . 12007 1 . 



Related Work on MRF Inference The two most important i nference problems for 
MRF 's are maximum a posteriori (MAP) inference and marginalization [Wainwright and Jordan . 
2008j |. Both are intractable in general and thus both require some relaxation. The simplest 
convex relaxation for both is based on exchanging an underlying conv ex hull of the feasible set, 
the m arginal polytope, by an approximation called the local polytope jWainwright and Jordan . 
2008| . However, even with this approximation the problems remain non-trivial, though solv- 
able, at least theoretically. A series of alg orithmic schemes were proposed to this end for the 



local polytope relaxations of both MAP iKomodakis et al\ 



2007 



Ravikumar et alV 



2012tlSavchvnskvv et al 



20101: Savchvnskvv et aO. 12 011: Sc hmidt et al. 



2011t ISchlesinger and Giginvak . 



120111: 



Kappes et al. 



20ia"Meshi and Globersoiri.,2011:lMartins a/.l.l2011| and marginal 



izatio n Wainwright et al. , 20051 : Jancsarv and Mat j . l201lHHazan and Shashual 2010l : Hazan et al. 
2012j . It turns out that the corr esponding dual probl ems have dramatically less variables and 



contain very simple constraints Werneil 
uncon strained problems as it is done by 



20071 (20091 ■ hence they can even be formulated as 



Schlesinger and Gig invad j2007j and lKappes et al. 



2012l |. Therefore, most of the approaches address optimization of the dual objectives. A 



common difficulty for such approaches is the computation of a feasible relaxed primal es- 
timate from the current dual one. Infeasible estimates can typically be obtained from the 
subgradients of the dual functio n as shown by Komodakis et al\ 2011 or from the gradients 
of the smoothed dual as done bv ljohnson et al. |2007l |. Werneij |2009l |. and Savchvnskvv et al. 



2011 



E ven some approaches working in the primal domain [Hazan and Shashua , 2010l : Martins et al. 



201lUSchmidT et aLl . uOlluMeshi and Globersonl . l2011j maintain infeasible primal estimates, 
whilst feasibility is guaranteed only in the limit. 

Quite efiicient primal schemes based on graph cuts proposed by iBovkov et oil |200l| do 
not solve the problem in general and optimality guarantees provided by them are typically 
too weak. Hence we do discuss neither these here, nor the wide spread message passing and 
belief propagation |Kolmogorov , 120061: IWeiss and Freemanll200l| methods, which also do not 
guarantee the attainment of the optimum of the relaxed problem. 



Forcing Feasibility of Primal Estimates The literature on obtaining feasible primal 
solutions for MRF infe rence problems from infeasible ones is not very vast. Apart from 
our c onference papers (Savchvnskvv et all . 120111 ISchmidt et al. . 2011 : Savchvnskvv et al. 



20 



3) preceding this work , we are aware of on ly two recent works contributing to this topic. 



Schlesinger et al\ |201lt and I Werneij 12011 



by . . 

201l| is formulated in the form of an algorithm 
able to determine whether a given solution accuracy e is attained or not. To this end it 
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restricts the set of possible primal candidate solutions and solves an auxiliary quadratic 
programming (QP) problem. However, this approach is unsuited to compute the actually 
attained e directly and the auxiliary QP in the worst case grows linearly with the size of 
the initial linear programming problem. Hence obtaining a feasible primal solution becomes 

prohibitively slow as the size of the problem gets l arger. 

Another closely related method was proposed bv lWerneil {2011 1. It is, however, only suited 
to determine whether a given solution of the dual problem is an optimal one. This makes it 
non-practical, since the state-of-the-art methods achieve the exact solution of the considered 
problem only in the limit, after a potentially infinite number of iterations. 



Content and Organization of the Paper Besides this introduction the paper contains 
five further sections. In Section [2] we describe a general formulation and mathematical prop- 
erties of the optimizing projection V{x, y), as already introduced for a special case in ©-(jS]). 
We do this without relating it to inference in MRFs, to allow readers not familiar with the 
latter to catch the idea. Section [3] is devoted to both MAP and marginalization inference 
problems for MRF's and specifies how the optimizing projection can be constructed for cor- 
responding primal and dual problems. In Section |4] we provide a list of algorithmic schemes 
working in the dual domain and show how primal estimates can be reconstructed from dual 
ones for all of them. The feasibility of the estimates is guaranteed by our optimizing projec- 
tion method. The last Sections [S] and [5] contain the experimental evaluation and conclusions, 
respectively. 



2 Optimizing Projection 

Let us denote by H^i : M" — > I? an Euclidean projection to a set D C M". Let AT C R" and 
Y C E™ be two subsets of Euclidean spaces and D <Z X xY he a closed convex set. We will 
denote as Dx the set {x ^ X \3y e 3^: {x,y) E D}, that is the projection of D to X. 

The main definition of the paper introduces the notion of the optimizing projection in its 
general form. A possible simplification and the corresponding discussion follow the definition. 

Definition 1. Let f : X x Y ^ M. be a continuous convex function of two variables. The 
mapping Vf.o '■ X xY ^ D such that Vf^oix, y) ~ [x' , y') defined as 

x'^UdAx). (4) 
y' = min f{x', y) , (5) 

y. {x',y)eD 

is called an optimizing projection onto the set D w.r.t. the function f. 

The definition shows the way to get a feasible point (a;', y') G D from an arbitrary infea- 
sible one {x,y). Of course, getting just any feasible point is not a big issue in many cases. 
However, as we will see soon, the introduced optimizing projection possesses properties sim- 
ilar to the properties of a standard Euclidean projection, which makes it a useful tool in 
cases when its computation is easier than the one needed for the Euclidean projection. To 
this end both the partial projection ^ and the partial minimization ([5]) should be efficiently 
computable. 

The role of projection ([4]) is to make x "feasible", i.e. to guarantee for x' that there is 
at least one y £ y such that {x',y) G D, which guarantees the definition to be well-defined. 
If this condition holds already for x, it is easy to see that x' = x and hence computing Q 
is trivial. We will call such x feasible w.r.t. D. Indeed, in (jU one can apply an arbitrary 
projection, since they all satisfy the mentioned property. However, we provide our analysis 
for Euclidean projections only. 
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Example 2.0.1. Consider the linear programming problem ([T]) from the introduetion. It is 
reasonable to construct an optimizing projection Vf^oix, y) for it as in ([2])-([3|), denoting with 
/ and D the objective function and the feasible set of the problem ([T]). 

We will deal with objective functions, which fulfill the following definition: 

Definition 2. A function / : X x y — >■ R is called Lipschitz- continuous w.r.t. its first argu- 
ment X, if there exists a finite constant Lx{f), such that Vy x,x' G X 

\f{x,y)-fix',y)\<Lxif)\\x~x'\\ (6) 

holds. Similarly / is Lipschitz-continuous w.r.t. 

• 2/ if \f{x,y) — f{x,y')\ < — y'\\ for all x € X, y,y' £ Y and some constant 
Lrif); 

• z — {x,y) if \f{x,y) — f{x',y')\ < LxY{f)\\z — z'\\ for all z,z' G X x Y and some 
constant Lxvif) ■ 

The following theorem specifics the main property of the optimizing projection, namely 
its continuity with respect to the optimal value of the function /. 

Theorem 2.0.1. Let f be convex and Lipschitz-continuous w.r.t. its arguments x and y and 
let f* be the minimum of f on the set D. Then for all z ~ (x,y) G X x Y 

IfiVfA^^y)) - r I < \fix,y) - r I + (Lxif) + Ly(/))||z - IId{z)\\ (7) 

holds. If additionally x is feasible w.r.t. D the tighter inequality holds: 

\f{VfM^.y)) - r I < \fi^. y)-r\ + LY{f)\\z - \1d{z)\\ . (8) 

Proof. We wiU denote {xP,yP) = zP = Ud{z) and {x\y') = Vf^oix^y). Note that 

• from /* < f{x',y') < f{x',y") for any y" e Y such that {x',y") e D it follows that 

f* <fix\y')<f{x',yn, (9) 

• from 1 1 2: — 1 1 = — xP\\^ + \\y — yP\\'^ it follows that 

\\y - /II <\\z- zP\\ and ||a; - xP\\ < \\z - zP\\ . (10) 

• according to ^ x' = n_D-^,(a;) = argminjg^)^ ||a; — i|| and hence ||a; — x'|| < ||a: — xP\\ 
since xP S Dx. Combining this with (fTO|) we obtain 

lk-x'||<||z-z^||. (11) 
The proof follows from the following sequence of inequalities: 

\f{rfM^,y)) ri = \f{x',y') - ri f \f{x',yP) - f*\ 

< \f{x\yn - f{x',y)\ + \fix',y) - r I < LY{f)\\y - y^ + |/(x',y) - f*\ 

m 

< Ly{f)\\z-zP\\ + \f{x',y)-f*\. (12) 
Estimate ([5]) follows from ([T^ assuming that x' = x. 
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The proof for the more general case ([7]) fohows from p2|) and Lipschitz-continuity of / 
w.r.t. x: 

d 

\f{Vf.D{x.y))- n < LY{f)\\z-zP\\ + \f{x',y)^r\ 

< LY{f)\\z - zP\\ + \f{x',y) - f{x,y)\ + \f{x,y) - /*! 
< Ly(/)||z - II + LxinWx' - x\\ + \fix,y) - /*{ 

< LY{f)\\z-zq+Lx{f)\\z^zP\\ + \fix,y)-r\ = (Ly(/)+L^(/))||z-z''|| + |/(x, y)-/* | . 

(13) 

□ 

Theorem 12.0.11 basically states that if the sequence z* = (x*, y*) G X x y, t = 1, ... ,00 
weakly converges to the optimmn of /, then the same holds also for y*). More- 

over, the rate of convergence is preserved up to a multiplicative constant. Please note that 
V f_D{x,y) actually does not depend on y, it is needed only for the convergence estimates ([7]) 
and but not for the optimizing projection itself. 

Let us provide an analogous bound for the Euclidean projection to get an idea how good 
the estimate given by Theorem 12.0.11 is. Let z and z^ be defined as in the proof of the 
theorem. Then 

\f{zn - r I < \f{zn - f{^)\ + 1/(^) - r I < i/w - r I + Lxvimz z^ . (i4) 

We see that bounds ([7]) and (|14p for the optimizing mapping and Euclidean projection 
differ only by a constant factor: in the optimizing mapping, the Lipschitz continuity of the 
objective / is considered w.r.t. to each variable x and y separately, whereas the Euclidean 
projection is based on the Lipschitz continuity w.r.t. the pair of variables (a;, y). 

The following lemma shows the difference between these two Lipschitz constants. Together 
with the next one it will be intensively used in the rest of the paper: 

Lemma 2.0.1. The linear function f(x,y) = (a,x) + {b,y) is Lipschitz-continuous with 
Lipschitz constants Lx{f) < ||a||, LY{f) < \\b\\ and Lxvif) < \/ Lx{f)^ + LY{f)^ ■ 

Proof. All three Lipschitz-constants are derived from the Cauchy-Bunyakovsky-Schwarz in- 
equality 

{c,iy) < \\c\\ ■ \\iy\\, c,i^ e (15) 
applied respectively to x, y and z — (x, y) in place of v. □ 

Lemma 2.0.2. The function f{z) ~ {a,z) + X^i^i ZilogZi, where log denotes the natural 
logarithm, is Lipschitz-continuous in the box [e, M]^ 9 e > 0, M > e with Lipschitz- 
constant 

ixy(/)< ||a||+A^max{|l + log£|,|l + logAf|}. (16) 

Proof. The function fi{zi) — z^logz^ of a single variable is diffcrcntiable on [e, M] and its 
derivative f[{zi) = 1 + logz^ is monotone increasing, hence fi{zi) is convex. This implies 
fi{zi) - fi{z[) < fi{zi){zi - Z-) and [/^(zi) - /i(z,')| < \f^{zt)\\{zi - z-)|. Taking into account 
that due to monotonicity \fl{zi)\ < max{|l -I- loge|, |1 -I- logMj} for z; S [£,M], and using 
the fact that L{fi + f2) < L{fi) + i/(/2) together with Lemma [2.0.11 one obtains (fT6|) . □ 
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3 MRF Inference and Optimizing Projections 



In this section we consider optimization problems related to inference in MRF's and construct 
corresponding optimizing projections. We switch from the general mathematical notation 
used in the previous sect ions to the one specific for th e considered field, in particular we 



mostly follow the book of IWainwright and JordanI [2008 



The section consists of two subsections. The first one describes the MAP-inference prob- 
lem for MRFs, its local polytope relaxation and the primal and dual formulations. In that 
subsection we show how the optimizing projection introduced in Section [2] can be applied to 
obtain both primal and dual feasible estimates. 

The second part is devoted to a decomposition-based dual formulation, and it introduces 
basic notions for Section 21 Additionally we show here ho w feasible primal estimat es can be 
obtained for the tree-reweighted free energy introduced by I Wainwright et~al. 2005 1. 



3.1 Local Polytope Relaxation 

This section is devoted to the maximum-a-postcriori (MAP) inference problem for Markov 
random fields, known also as an energy minimization problem. We derive primal and dual 
formulations for the so-called local polytope relaxation of the problem, analyze their separa- 
bility properties and construct the corresponding optimizing projections. 

3.1.1 Primal Problem 

Preliminaries Let Q = (V,£) be an undirected graph, where V is a finite set of nodes 
and £ C V X V is a set of edges. Let further X^, v G V, he finite sets of labels. The set 
X — <^yi=\;Xy, where (E) denotes the Cartesian product, will be called labeling set and its 
elements x £ X are labelings. Thus each labeling is a collection (x^ : v £ V) of labels. To 
shorten notation we will use Xuv for a pair of labels {xu,Xy) and X^v for Xu x Xy. The 
collections of numbers 6'„.a:„, w € V, Xy G Xy and 0uv,x^^,, uv £ £, Xyy G Xyy will be called 
unary and pairwise potentials, respectively. The collection of all potentials will be denoted 
hy9. 

The problem is to compute the labeling x which minimizes the energy function Eg: 



min Eg {9, x) = min <y^9y,x^+ Ouv.x^^ > ■ (17) 

KvGV uve£ ) 



An alternative way of writing problem (jl7|) is to express it in the form of a scalar product 
of the vector 9, denoting the collection of all 9y^x^i v £ V, Xy £ Xy and 9uv,x^^, uv G 
Xuy £ Xyy, with a suitably constructed binary vector (j){x), x £ X: 

min(0,0(.T)) . (18) 

x^X 

Denoting R^i-ev l'*'"l+I^ui,e£ \Xu^\ g^g ]R(^m) and the corresponding non-negative cone 

"^"^ " ""gg as M^, we relax ([T7| to the linear programming problem Schlesinger . 
197fiHWernell20n7j 



Y.x^(^X^ Mf.^^ =1' V £V , 

s-t- Y^x^dX^ fJ'uv,x^^ = lJ.u,x^, Xu £ Xy, uv £ £ , (19) 
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The constraints in (jl9p form the local polytope, later on denoted as C. Slightly abusing 
notation, we will briefly write problem (|f 9|) as 



min£;(^) := min(6',^t) . (20) 
Remark 3.f.f . Please note that introducing additional constraints 

fj-v.x^ e {0, f } and Huv,x,,„ e {0, f }, u e V, UV e £, Xy e Xy, Xuv e X^v , (2f ) 

would make (|f 9p equivalent to (|f 7p . Each labeling x ^ X corresponds to some point ^ 
satisfying the conditions of (fT9|) and ((2T|) . namely that having = 1 iff = Xi,, else. 



Optimizing Projection We will denote as O^j and /i^,, i/; € V |J £, the collections of 6^ 
and fJ.w,xun G Afju, respectively. Hence the vectors 9 and /i become collections of and 
/im, w € V y The ri-dimensional simplex {a; e R" : X]r=i ^« ~ -'^i ^i^'^ ^'^ denoted as A{n). 
Problem ((T9)) has a separable structure similar to ([1]), i.e. for suitably selected matrices 

s.t. '^^^^(^"I)' (22) 

Note that under fixed the optimization of ^22\ splits into small independent subproblems, 
one for each uv £ £. We will use this fact to compute the optimizing projection onto the 
local polytope C as follows. 

Let fiv and fig be collections of primal variables corresponding to graph nodes and edges 
respectively, i.e. /iy = (Mim v e V) and fi£ = {iiuv, uv £ £). The corresponding subspaces will 
be denoted by R(Mv) and R(M£). Then according to and Definition [U the optimizing 
projection Ve,c- M(Mv) x R(M£) — > C maps (/iv,/i£) to (/iyiA^f) defined as 

l^'v =^A{\xMl^v), vev, (23) 

pi^„=arg mmjeuy^fi^y) 

f^uv>o ^ UV £ £ . (24) 

S.t. A^yjly^y /i^, 

Note that both ([23| and ([24| can b e comput e d ver y efficiently. Projection to a simplex 
in psp can be done e.g. by method of Michelot 1986l |. The optimization problem in ([24| 



constitutes a small-sized transportation problem we ll-studied in linear programming, see for 
example the text-book of Bazaraa and Jarvi^ |l977 |. 



Let us apply Theorem 12.0.11 and Lemma 12.0.11 to the optimizing projection Ve.c intro- 
duced in Definition[TJ According to these, the convergence rate of a given sequence y^t* € M(M) 
in the worst case slows down by a factor Lmv{E) + LueiE) < \\9v\\ + ll^fll- This factor can 
be quite large, but since the optimum E* grows together with the value ||0v|| + ll^ell, its 
influence on the obtained relative accuracy is typically much less than the value itself. 

Remark 3.1.2. However, if 6 contains "infinite" numbers, typically assigned to pairwise factors 
to model "hard" constraints, both optimizing and Euclidean projections can be quite bad, 
which is demonstrated by the following example, depicted in Fig. [T] V = £ = uv, 

Xy = Xu = {0, 1}, ^00 — ^11 = ^01 = 0, = oo. If now fiy^ > t^u,i, optimizing w.r.t. 
leads to Oiq ■ fJ,vu,io — oo ■ {fJ-v.i — fJ-u,i), whose value can be arbitrary large, depending on 
the actual numerical value approximating oo. And since neither the optimizing projection 
nor the Euclidean one take into account the actual values of pairwise factors when assigning 
values to /ly, the relation fXy^i > fj,u,i is not controlled. 
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V u 



Figure 1: A pairwise factor of a graphical model. Vertically oriented rectangles correspond 
to graph nodes v and u, black circles inside to variable states and 1. Lines connecting states 
in two nodes correspond to different values of pairwise potentials. Potentials corresponding 
to all states and pairs except the one 0«u.io denoted by a thick red line are assumed to be 
equal 0, whereas 9vu,w is assumed to be infinitely large. Clearly optimal values of primal 
variables /i„ and /i„ assigned to these nodes always satisfy /iu.i < /iu.i- Otherwise, due to 
local polytope constraints, an optimal value ^vu.io corresponding to the infinite pairwise 
factor value is equal to ji^^i — fiu.i- This corresponds to arbitrary large primal objective 
values even for very small positive values of /i^^i — /iu.i- 



We provide an additional numerical simulation related to infinite values of pairwise po- 
tentials in Section [5l 

Remark 3.1.3 (Higher order models and relaxations). The generalization of the optimizing 
pr ojection (I23l)-(l24l) for both highe r order models, and higher order local polytopes introduced 
bv IWainwright and JordanI 2008 . Sec. 8.5] is quite straightforward. The underlying idea 



remains the same: one has to fix a subset of variables such that the resulting optimization 
problem splits into a number of small ones. 

Remark 3.1.4 (Efficent representation of the relaxed primal solution). Note that since the 
pairwise primal variables us can be easily recomputed from unary ones /Lty, it is sufficient 
to store only the latter if one is not interested in specific values of pairwise variables fis- 
Because of possible degeneracy, there may exist more than a single vector fi£ optimizing the 
energy E for given /iy. 

3.1.2 Lagrange Dual Problem 

Preliminaries Problem can be written in a more compact form with a suitably se- 
lected matrix A and vector b: 

min {9,ij) s.t. Afi = b (25) 

Introducing the space M° Mli^l+l^l+2:„„e£(l'^vl+l^ul)^ ^hc dual problem reads 

max (b,iy) s.t. A'^ ly < . (26) 

In what follows we will sometimes require an explicit form of A^. To this end we denote 
as J\f{v) = {u G V: uv E £} the set of neighboring nodes of a node v G V. We consider the 
dual variable ly € MP to consist of the following groups of coordinates: ly^, u G V; t^,,.],, uv € £ 



and fp^^i.x.. 1 V € V] u € A/'(w), x^, G A'^,. The dual (pS)) can be written explicitly jSchlesinger 
19761: IWerneJ . 120071 as: 
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max > + > i^uv (27) 

^ ^ I Qv.x^. — SueA/'(i;) i^?'-i-u,2;„ > , Vw £ V, X.^, G ^"1, , ^^^^ 

Wc will use the notation := (6, i^) = X^usv + Suuee '^"i' ^^r the objective function 
of the dual problem ((26)) . 



Optimizing Projection The dual (|26|) possesses clear separability as well. From ([28)) it 
follows that after fixing all variables except v.,,, w € V, and v^v, uv £ £, the optimization 
w.r.t. the latter splits into a series of small and straightforward minimizations over a small 
set of values 

= min 0y,x^, - Vv^u.x,., v eV , (29) 

Vuv = min 9uv,x^^ + Vu^v,x^ + Vy^u,x^ , uv e £ . (30) 

The formula (|^5)) can be applied directly for each v £ V, and ([3D]) accordingly for each uv E £. 

We denote by D the dual feasible set {v £ R{D) : u < 9}. We split all dual variables 
into two groups. The first one will contain "messages" i/-^ = {lyy^u, w G V, w G J\f{v)), 
that are variables, which reweight unary and pairwise potentials leading to improving the 
objective. The vector space containing all possible values of these variables will be denoted 
as M(D_>). The second group will contain lower bounds on optimal reweighted unary and 
pairwise potentials i^o ~ {i^w, w e VlJ£). The total sum of their values constitutes the dual 
objective. All possible values of these variables will form the vector space M(]D>o)- Hence the 
optimizing projection Vu.ii- M(ID'^) x M(Do) ^ K(I])) maps (i/_>,i/o) to {v'^,v'q) as 

v'y^u = Vv^u, w e V, u e N{v) , (31) 



= mm uy^^^ 



E,,..,..---.' (32) 



v'^y = min 9uv,x^,. + Vu~,v,x^ + v'v-^u x„i uv e £ . (33) 

Equation pip corresponds to the projection which has the form 11^(0^) (i'^) ~ '^-^o 
and is thus trivial. 

Applying Theorem 12.0.11 and Lemma r2.0.1l to the optimizing projection Vufi yields that 
the convergence of the projected ly* slows down no more than by a factor < \VV \ + \ V£\ 
and does not depend on the potentials 9. However, since an optimal energy value grows often 
proportionally to |V| + \£\, the influence of the factor on the estimated related precision is 
typically insignificant. 



3.2 Decomposition Based Dual Problem 

In this section we introduce an alternatively constructed dual objective, corresponding to 
the local polytope relaxation of the MAP-inference problem. We also consider a smoothed 
approximation thereof and show its connection to the so called tree-reweighted free energy. 
For the latter, we additionally construct the corresponding optimizing projection. 
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3.2.1 Non-Smooth Dual Objective and Associated Subgradient 

Graph Decomposition There is an alternative way to formulate a dual problem to (fT7|) . 
The corresponding technique is called Lagrangian or dual decomposition. We describe it here, 
because it is the base for most state-of-the-art dual algorithms for MAP and marginalization 
inference. In Section |4] we show how one can reconstruct primal estimates from dual ones for 
a range of such algorithms. 

The dual decomposition technique is based on a decomposing the graph Q into several 
subgraphs, which jointly cover Q and for which solving ()17|) i s easy. The subgraphs' s truc- 
tures determine the underlying relaxation, and it is shown by iKomodakis et al. [2011 1 that 



if all subgraphs are acyclic, the corresponding relaxation coincides with the local polytope 

relaxation, defined by ((T9)) . It is also known that on acyclic subgraphs problem pT)) can be 

efHciently solved by dynamic programming. 

To keep our exposition simple, we will consider the case of the graph Q being completely 

covered by only two acyclic subgraphs, which can be done e.g. when Q has a grid structure. 

This allows to avoid technical details, preserves the main idea and can be generalized to more 

involved decompositions quite straightforwardly. 

Let C/* = (V%f ), I = 1,2, be two acyclic subgraphs of the master graph Q. Let = 
= V, E^yjE"^ ~ £ and f ^ p|f ^ = (e.g., £^ may contain all horizontal edges of Q and 

£^ all vertical ones if is a grid graph). Then the overall energy becomes the sum of the 

energies corresponding to these subgraphs, 

2 

Eg{e,x) =Y.Y. (^l^. + E = Eg.{9\x)+Eg.{9\x) , (34) 

provided 61, = | "^^g^ , * = 1,2 and 9^,^ + 9l,^ = V« S V,x, G X,. 

The latter condition can be represented in a parametric way as 6l ^.^ — -j- Xy^x^, and 

^v.x^, = ~^2^ ^ K<,x^,v € V, Xy € Xy, where Xv,x^ & Thus we consider 6*' as a function of 
= i^v,x„ : w e V, Xy ^ Xy) and have 

2 

min (6', x) > max min i?c' (6*' (A), a;) = min Ea{9,ii). (35) 

x^X ' ' \ Z — t rr(^y ^ ^ ,,^r(r.\ 



The last equation is not straightforward and we refer to the paper of lKomodakis ct al\ 2011 1 
for the proof. 

Subgradient The function 

2 2 2 

C/(A) ^C/'(A) ■.= Y^Mg.{B\\\x) ^ nnn (^'(A), 0(x)) (36) 

z— 1 z— 1 z— 1 

is concave, but non-smooth. Its subgradien10 is equal to 

^E^^'^v(.^*')-0v(2;*^), (37) 
i— 1 

where x** — argmin3.g;f <^0*(A), 0(a;)^ , i ~ 1,2. As we already mentioned, the x** are com- 
putable by dynamic programming. This computation constitutes the bas is fo r subgradient al- 
gorithms for MAP inference proposed bv lSchlesinger and Giginvak 2007 1 and Komodakis et al. 



^Sometimes the term supergradient is used for concave functions. 
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|2011 |. In Section |4] we will show how one can reconstruct primal estimates for this kind of 
algorithm. 



3.2.2 Smoothed Dual Objective and Associated Gradient 

Another way to maximize U{X) is to smoo th it first and apply efficie nt smooth optimization 
techniques afterwards, as it is done e.g. by ISavchvnskvv et ali 2011 1. 

To obtain a smooth approximati on, we replace min (or rather — max) by the well-known 
log-sum-exp (or soft-max) function Rockafellar and Wetd . 2004 : Nesterov . 2004 1. yielding 



Up{X) :-^?7;(A) :=-^plog^exp(-0XA)/p,0(x)) 



(38) 



wit h smoothing parameter p . The function Up uniformly approximates U, as shown by 
e.g. Savchvnskvv et al. 201lj . that is. 



UpiX)+2p\og\X\ > (7(A) > Up{\). 



(39) 



Please note that for acyclic graphs evaluating [/* (and thus Up) is as easy as C/*, and 
can be done by dynamic programming. 

We introduce the vectors of "marginals" ^'(A) G M'^, i e {1,2} by 



mUA) 



E exp(-0»(A)/p,0(x')) 



cxpi-u;{\)/p) 

It is well-known, that the gradient of Up is equal to 

V;7p(A)=/ii(A)v-Mp(A)v. 
We refer to Savchvnskvv et al. j201ll Lemma 1] for technical details. 



(40) 



(41) 



3.2.3 Tree-Reweighted Free Energy 

Let Ny and N^v be the numbers of subgraphs containing node w G V and edge e £ of the 
graph Q. In the considered special case of the grid graph, — 2 and Nuv = 1- 



Definition 3. The function Ep : C 



depending on a positive parameter p 



Epitj) {d,p) - P(^^ ^ Ny fly^^^ Jog fiy^, 



log 



(42) 

UV^t Xuv^Xuv 

is called the negative tree-reweighted free energy, introduced hu \ Wainwriaht and Jordarl t2004 l. 

The problem of minimizing Ep on C is important due to the fact that it is dual to the 
problem of maximizing U, as shown e.g. by IWerneil |2009f . The duality holds not only in 
the considered special case of decomposition into two acyclic subgraphs, but for a decom- 
position into any number of acyclic subgraphs. In contrast to U , whose dual (jl9p does not 
depend on the decompositio n, the function Ep does, indee d. Maximizing w ith p = 1 is 
used by different algorithms [Wainwright et all . 1200 5i : iJancsarv and Mata . 12011 1 to estimate 
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marginal probabilities of the underlying Gibbs distribution [Wainwright and Jordan , 2008| . 



The maximization delivers in the limit the same value of the objective as the minimization 
of Ep. Hence it is important to compute feasible primal estimates based on dual iterates. 

Another important meaning of Ep is considering it as an approximation of the relaxed en- 
ergy E. This is due to the fact that the difference between (|42|) and ([20]) vanishes continuously 
with p. This is stated precisely by the following lemma: 

Lemma 3.2.1. For functions E and Ep defined respectively by i20\) and such a constant 
Ch > exists that 

Epifi) < E{fi) < Epip) +p-Ch, peC (43) 

and 

e;<E* <e;+p- Ch (44) 

hold. 



Proof. The value in brackets in (|42|) can be represented as a convex combination of entropies 
of Gibbs distributio ns associated with subgra.phs partici pating in the decomposition (p4| 
up to a scale factor [Wainwright et at . 20051 : Werner . 2009| . The entropies are non- negative 



and bounded on C functions. Their convex combination is non-negative and bounded as well, 
which proves the statement of the lemma. □ 

We will employ Lemma 13.2.11 in Section 01 



Optimizing Projection w.r.t. Tree-Reweighted Free Energy The negative free en- 
ergy E is separable w.r.t. py and ps like the MAP-cnergy function E. Since the underlying 
constraint set — the local polytope — is the same, the definition of the optimizing projection 
w.r.t. the negative free energy Ep differs only slightly from the one for the MAP-encrgy E. 
Namely, the optimizing projection 7^^ ^: R(Mv) x ]R(M£) — >■ £ maps {p\>,p£) to (/iy,/Xg), 
defined as 

=^Ai\x^\){tJ-v), V eV , (45) 

p'^^ = argmin (9uv + pNuv log {p'ulJ-'v) i l^uv) - pNuv {tJ-uv Aog Puv) 

p-u^>o , uv e£ . (46) 

S.t. J^uvt^uv — Mi; 

The only difference of 7^^ ^ to Ve.c defined by ([23 |) -(f24 | is the objective function. It is 
not linear anymore as in (|24p , but (j46|) constitutes rather a small-sized entropy minimization 
problem^ which can be solved e.g. by interior point methods. 

As before, we apply Theorem 12 . 0.11 and Lcmma r2.0.2l to get an idea about the convergence 
rate of the projected sequence Vj^ ^(Mv' /^f)' ^ = 1, ■ • ■ , oo, in comparison to the convergence 
rate of the original, infcasiblc sequence p^ . According to Lemma [2.0.2[ the estimation of 
the Lipschitz constant for the function (|46p becomes bad (the constant becomes big) when 
some coordinate p\^^ vanishes. However, the increase of the Lipschitz constant is only 
logarithmic w.r.t. the precision e which has to be attained. Hence, the role of entropy terms 
lAiv x„„ log/^«f,Xu„ slowing down the convergence of feasible primal estimates is typically 
insignificant. 



4 Application to Algorithmic Schemes 

In previous sections we concentrated on the way to compute the optimizing projection assum- 
ing that the weakly converging (but infeasible) sequence is given. In contrast, this section is 
devoted to the methods of obtaining such sequences within different optimization algorithms. 
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Saddle-point Primal-Dual Algorithms In the simplest case the (mfeasible) optimizing 
sequences for the primal (pSj) and dual (f26| problems are generated by an algorithm itself, as 
it is typical for primal-dual saddle-point formulation based algorithms. A striking example of 
such a n approach is the First Order Primal-Dual Algorithm (FPD) by ,Chaniboll c and Pock 



I2OIOI, which was recen tly applied to the local polytopc relaxation of the MAP problem 



bv lSchmidt et all |2011 



The pair ([25)) -p6 )) is cast as a saddle point problem via their Lagrangian, 



maxmin {{—b,i') + {fi,A i') — {0,^)}. 

fi>0 V 



(47) 



The FPD algorithm itcratively updates the primal ^* e and dual G R''' approximate 
solutions and guarantees their weak convergence to the optimum of the primal (I25|) and 
dual (|26p problems, respectively. However, the iterates /i* and arc not feasible in general 
and hence computing a duality gap requires their projection to the feasible sets. We do this 



by computing Ve,c{A;. Me): as defined by ([231) -([Ml), and VuM'^U^'^o), defined by (|3T1) -(|33 



Subgradient Ascent One of the first optimization algorithms with convergence guarantees 
for the dual decomposition based ob jecti ve U defined in p6[) was subgradient ascent proposed 
bv ISchlesinger and Giginvak and iKomodakis et all |2007l |. It produces the sequence 



\t+i 



(48) 



where r* is a positive step-size fulfilling the conditions 



0, j:-' 



(49) 



It is shown by iLarsson et all jl999| and later applied by IKomodakis et al. 2011 1 that 
time-averaged optimal labelings (/)(a:**'*), i = 1,2 (see ([37)) ) converge to the primal solution 
of the relaxed MAP-inference problem (|19|) . This implies that there is an optimal solution 
fi* of (HI]), such that 



t 



(50) 



The same convergence guarantee holds also for weighted averaging with step sizes r* 



(51) 



None of the sequences /j,*, neither the one defined in ([5II| nor that of (|5ip . is feasible in 
general. Hence one has to apply the optimizing projection Ve,c defined by (P5|) - ([M)) to make 
them feasible. Please note that an explicit form of /^^ is not important for this operation, 
since Vex does not actually depend on it. Taking into account that the dual variables A* 
are unconstrained and hence feasible without any projection, one can directly estimate the 
duality gap as E{Ve.M^I^£)) " U{X'). 



Methods Based on Smoothing, Tree-Reweighted Primal Bound |Savchvnskvv et al 



2OIIL I2OI2I ; IJancsarv and Mat j . I2OIII : iHazan and Shashual . l201o| ^ Reconstructing 
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a primal sequence for gradient-based methods optimizing the smoothing objective U is sim- 
ilar to (|50p. but does not even require an averaging of gradients over iterations to obtain 
convergence. 

In particular, the vector /ip(A*) defined by (|40| converges to an optimum jl* of the tree- 
reweighted free energy Ep as A* approaches an optimum A* of the smoothed dual function 

^i-.w&V'yjE' /i;(A*) /i; . (52) 

Analogous to the subgradient optimization we can apply the optimizing mapping ^ 

to get a feasible primal estimate without knowing an explicit expression for /Xg, since the 
optimizing projection does not depend on its second argument. As in the non-smooth case, 
since the A* are unconstrained, the value Ep{V^ c^iAnt^^)) ^ Up{\^) is the duality gap. 

Remark 4.0.1. If the final objective of the optimization is not the trcc-reweightcd free energy 
Ep, but the MAP-encrgy E, and the smoothing is used as an optimization tool to speed 
up convergence, one can obtain even better primal bounds for a lesser computational cost. 
Namely, due to (|43p the optimizing projection Vex '^^'^ tie applied to approximate the 
optimal solution of the MAP-energy E. Denote 

fi' = {fi'-^^fi'g) ^V^^J^{^lv,^J'£) and ^' = (^iy, A^g) = "^bxCmv, /^f) • (53) 

From the definitions (|23| and (|45|) follows that fi'y = /iy, and thus due to (|24|) and (|46| 
-£'(m') ^ This means that the projection Vex preferable for approximating the 

minimum of E over C even in the case when the smooth objective Up was optimized and 
not the original non-smooth U . As an additional benefit, one obtains faster convergence of 
the projection even from the wost-casc analysis, due to a better estimates of the Lipschitz 
constant for the function E compared to the function Ep, as estimated in Lemmas 12.0.11 
and [2:0:2] 



Bundle methods, ADLP, ADMM, TRWS, MPLP and others. Analogous converg- 
ing primal sequences can be constructed for other op timization approach es as well. For 
bundle methods, recently applied to MAP-inference bv lKappes et al\ 2012l |. one has to av- 
erage the resulting optimal labelings (/'(x *''*), t = 1, . . . , cc, with weights obtained from 
the solution of the auxiliary problem, see Kappes et al\ 2012 , eq. 23]. 

Other examples are augmented Lagrangian based optimiza t ion schemes, which were re- 
cently applied to the MAP inference problem by iMartins et al. |201lj ; iMeshi and Globerson 



2011 



These algorithms augment the Lagrangian (|47|) with a quadratic term and combine 
coordinate descent in the primal domain with subgradient steps in the dual one or vice versa. 
An important property of these schemes is that they maintain (in general) infeasible primal 
and dual estimates, which can be projected to the feasible sets with optimizing projections 
similar to the described Ve,c and Vu.o- 

However, we are not aware of methods for reconstructing primal solutions of the relaxed 
pro blem from dual es timates for non -smooth coordinate descent based schemes like TRW-S 
by iKolmogorov )2006| and MPLP by Globerson and Jaakkola 2007l | . Indeed, these schemes 
do not solve the relaxed MAP problem in general, hence even if one would have such a method 
at hand, it would not guarantee convergence of the primal estimates to the optimum. 



5 Experimental Analysis and Evaluation 

The main goal of this section is to show how Theorem 12.0.11 works in practice. Hence we 
provide only two groups of experiments to evaluate our method. Both concentrate on recon- 
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structing feasible primal estimates for the MAP inference algorithms considered in Section |4l 
In the first group we show how the projected primal MAP-solution converges to the optimum 
for a series of algorithms. In the second one we show how the bound ([I])-® allows for at 
least qualitative predicti on of the objective value i n the (feasible) projected point. We refer 



to our conference papers [Savchvnskvv et aZ.1 . 120111 : iSchmidt et a/.l . 120111 : iKappes et a/.l . l2012 



Savchvnskvv et al. , 2012| for the experiments with an extended set of benchmark data. 



For the experiments we employ our own implementations of th e First Order Primal Dual 



Algorithm (acronym FPD) as described bvlSchmidt et al\ 1201 iL t he adaptive diminishing 



smoothing algorithm AD SAL proposed bv lSavchvnskvv et al\ |2012| , th e dual decomp o sition 



based subgradient ascent with an adaptive step-size rule according to iKappes et al\ [2012 



eq.l7] and primal estimates based on averaged ([50)) (acronym SG-AVE) and weighted av- 
eraged ([ST]) (acronym SG-WEI) subgradients, and finally Nesterov's accelerated gradient 
ascent method app lied to the smoothed dua l decomposition based objective ([39]) (acronym 
NEST) studied bv lSavchvnskvv al. 2011 1. All im plementations are based on data struc- 
tures of the OpenGM librarv bv lAndres etal\ |2012 |. 

The optimizing projection to the local polytope w.r.t. to the MRF energy ([25 )) - ([M[) is 
computed using our implementation of a speci alization of the simplex algorithm for trans- 

I)ortation problems [Baz araa and J arvis . 1977[. We adopte d an elegant method by iBlandl 
1976[, also discussed by Papadimit riou and Steig lit3 |l998{ . to avoid cycling. The source 
code of the solver can be downloaded from the first author's web-sit^. 



Feasible Primal Bound Estimation In the first series, wc demonstrate that for all three 
groups of methods discussed in Section U) our method efficiently provides feasible primal es- 
timates for the MAP inference problem ([Tg)). To this end we generated a 256 x 256 grid 
model with 4 variable states (!<%"„ | = 4) and potentials randomly distributed in the interval 
[0, 1]. We solved an LP relaxation of the MAP inference problem with FPD as a represen- 
tative of methods dealing with infeasible primal estimates, subgradient methods SG-AVE, 
SG-WEI and ADSAL as the fastest representatives of smoothing-based algorithms. The 
corresponding plots arc presented in Fig. [2] We note that in all experiments the time needed 
to compute the optimizing projection Ve,c did not exceed the time needed to compute the 
subgradient /gradient of the dual function U /Up and requires 0.01-0.02 s on a SGHz machine. 
The generated dataset is not LP tight, hence the obtained relaxed primal solution has a 
significantly smaller energy than the integer one. In contrast to the cases where only non- 
relaxed integer primal estimates are computed, the primal and dual bounds of the relaxed 
problem converge to the same limit value. Due to the feasibility of both primal and dual 
estimates, the primal and dual objective functions' values bound the optimal value of the 
relaxed problem from above and below, respectively. 



Evaluation of Convergence Bound The second series of experiments is devoted to 
the evaluation of the convergence bounds provided by Theorem 12.0.11 To this end, we 
ge nerated four LP-tigh t grid-structured datasets with known optimal labeling. We refer 



to [Schmidt et al\ [201 1[ . pp. 95-96] for a description of the generation process. The resulting 
unary and pairwise potentials were distributed in the interval [—10,10]. We picked up a 
random subset of edges not belonging to the optimal labeling and assigned them "infinite" 
values. We created four datasets with "infinities" equal to 10 000, 100 000, 1 000 000 and 
10 000 000 and ran NEST for inference. According to Theorem 1 2 . . 1 1 the energy E evaluated 
on projected feasible estimates VExil^v^ l^s)' ^ = 1, . . . , oo, where the /x* were reconstructed 



^http;/ /hci. iwr.uni-heidelberg.de/StafT/bsavchyn/softwarc.php 
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Figure 2: Convergence of the primal (dashed Hnes) and dual (solid lines) bounds to the 
same optimal limit value for ADSAL and FPD algorithms (left) and SG-AVE and SG- 
WEI (right). The obtained integer bound is plotted as a dotted line. Note that due to the 
feasibility of both primal and dual estimates, the primal and dual objective functions' values 
bound the optimal value of the relaxed problem from above and below, respectively. 

from dual estimates according to ((5^ . can be represented as 

EiVE,M,f^l)) = Fif^') + LY{E)\\^^' - Ucfi'W (54) 

for a suitably selected function F. Since NEST is a purely dual method and "infinite" 
pairwise potentials did not make any significant contribution to values and gradients of the 
(smoothed) dual objective, the infeasible primal estimates f/ were the same for all four 
different approximations of the infinity value. Since according to Lemma |2 . . II the Lipschitz 
constant Ly{E) is asymptotically proportional to the values of the binary potentials Og we 
plotted the values \ogE{'PE,cif^VT f^s)) ^ function of t for all four datascts in Fig. [31 
As predicted by Theorem 12.0.11 the corresponding energy values differ by approximately a 
factor of 10, as the "infinite" values do. Due to the logarithmic energy scale this difference 
corresponds to equal log-energy distances between the curves in FigO 



6 Conclusions 

We presented an efficient and quite general optimizing projection method for computing fea- 
sible primal estimates for dual and primal-dual optimization schemes. The method provides 
convergence guarantees similar to the ones of the Euclidean projection, but contrary to it, it 
allows for efficient computations, when the feasible set and the objective function posses cer- 
tain separability properties. As any optimization tool it has also certain limitations related 
to the Lipschitz continuity of the primal objective, however exactly the same limitations are 
characteristic also for the Euclidean projection. Hence they can not be considered as disad- 
vantages of particularly this method, but rather as disadvantages of all projection methods 
in general and can be overcome only by constructing algorithms, which intrinsically maintain 
feasible primal estimates during iterations. The construction of such algorithms has to be 
addressed in future work. 

Acknowledgement. This work has been supported by the German Research Foundation 
(DFG) within the program "Spatio-/Temporal Graphical Models and Applications in Image 
Analysis" , grant GRK 1653. 
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Figure 3: Convergence of the obtained primal feasible solution for four datasets which differ 
only by the values used as "infinity" . The energy values are plotted in logarithmic scale. From 
bottom to top: optimal log-energy, primal bounds corresponding to infinity values equal to 
10 000, 100 000, 1 000 000 and 10 000 000. Please note that as predicted by Theorem ESD 
and Lemma |2 . . 1 1 the distance between corresponding log-energies remains approximately the 
same for all time steps and is equal to log 10, which corresponds to the multiplication factor 
determining the relation between different values of "infinity" . 
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